# Efficient Attention Mechanism
Seerattention Decode Qwen3 4B AttnGates
MIT
Provide the AttnGate weights for the decoding phase in the SeerAttention-R paper, supporting the inference tasks of the Qwen3-4B model
Large Language Model
Transformers

S
SeerAttention
4,295
1
Modernbert Base Squad2 V0.2
Apache-2.0
QA model fine-tuned from ModernBERT-base-nli, supporting long-context processing
Question Answering System
Transformers

M
Praise2112
42
2
Mistral 7B Instruct V0.2 Sparsity 30 V0.1
Apache-2.0
Mistral-7B-Instruct-v0.2 is an enhanced instruction fine-tuned large language model based on Mistral-7B-Instruct-v0.1, achieving 30% sparsity through Wanda pruning method without requiring retraining while maintaining competitive performance.
Large Language Model
Transformers

M
wang7776
75
1
Nystromformer 4096
Long-sequence Nyströmformer model trained on WikiText-103 v1 dataset, supports sequence processing up to 4096 tokens
Large Language Model
Transformers

N
uw-madison
74
3
Nystromformer 2048
Nystromformer model trained on the WikiText-103 dataset, supporting long sequence processing (2048 tokens)
Large Language Model
Transformers

N
uw-madison
38
1
Long T5 Tglobal Base
Apache-2.0
LongT5 is a text-to-text transformation model based on the T5 architecture, employing transient global attention mechanism for efficient processing of long sequence inputs
Large Language Model English
L
google
71.38k
42
Deit Tiny Distilled Patch16 224
Apache-2.0
This model is a distilled version of the Data-efficient image Transformer (DeiT), pretrained and fine-tuned on ImageNet-1k at 224x224 resolution, efficiently learning from a teacher model through distillation.
Image Classification
Transformers

D
facebook
6,016
6
Bart Base Cnn R2 18.7 D23 Hybrid
Apache-2.0
This is a pruned and optimized BART-base model, specifically fine-tuned on the CNN/DailyMail dataset for summarization tasks.
Text Generation
Transformers English

B
echarlaix
18
0
Chinese Bigbird Mini 1024
Apache-2.0
This is a Chinese pre-trained model based on the BigBird architecture, optimized for Chinese text processing and supporting long text sequence handling.
Large Language Model
Transformers Chinese

C
Lowin
14
1
Featured Recommended AI Models